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(54) System and method for detecting a network failure 



(57) An improved file system apparatus and method 
for minimizing the length of time a client system waits 
betore declaring a data communication link disconnect- 
ed. The apparatus and method dynamically modify a file 
system request time-oul value based on the actual 
length of time required to service each file system re- 
quest. In one embodiment, a time-oul value is deter- 
mined for each request type based on the actual re- 
sponse time and a buffer time for each request type. The 
response timer is based on readings from a system 
clock therefore operating as a low overhead process. A 
monitoring syslcm periodically tests the sorvor to en- 
sure that a physical connection still exists. 
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Description 

Background of the Invention 

1. Field of the invention 

The present invention relates to electronic data 
processing systems and more particularly to dislribuled 
data processing systems tor accessing data from a re- 
mote server Still more particularly, the present invention 
relates to apparatus and processes for monitoring low 
level file system requests over networks of varying 
bandwidth. 

2. Background and Related Art 

Individual computer systems are often connected 
to other computer systems using local area network 
(LAM) or wide area network (WAN) technology. Inter- 
connected systems can share system resources such 
as disk storage and printers. Client/Server systems are 
implemented in this environment by distributing the 
processing, storage or function between a client and a 
server workstation. The client workstation makes a re- 
quest that is satisfied by a server workstation. 

LAN/WAN networks have typically been implement- 
ed so that each workstation has a solid connection of 
defined bandwidth with the server. The solid connection 
and defined bandwidth provide relatively uniform ac- 
cess times between the client and server systems. 

Distributed terminal systems are implemented us- 
ing asynchronous connections between a terminal and 
a computer system. The asynchronous connections can 
be over dedicated wires or through dial-up telephone 
lines. Asynchronous processing allows tor great varia- 
tion in communications speed. Each request over the 
system is acknowledged so that any disconnection or 
delay in transmission can be noted and handled by the 
system. Lost transmissions may be resent until the en- 
tire message is received. Asynchronous processing al- 
lows greater variety of connection media, but typically 
is slower with greater overhead than directly connected 
LAN workstations. 

The evolving network market has led to an in- 
creased number ot methods for interconnecting work- 
stations One approach allows asynchronous connec- 
tion into a LAN through telephone lines. This approach 
is lound in the IBM LAN Distance Program Product. This 
product allows a client workstation to dial into a LAN 
from a remote location. Implementation requires specif- 
ic LAN Distance software at both the client and server 
workstations. 

Another interconnection technology is infrared (IR) 
connection. Infrared Direct Access connection (IRDA) 
replaces traditional wiring with a wireless system which 
uses infrared signals to transmit data. One disadvan- 
tage of IRDA systems is that physical obstruction of the 
line of sight path causes intermittent disconnection of 
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the infrared device. Software operating over IRDA links 
must be able tocontinue processing through intermittent 
disconnections. 

Radio Frequency (RF) links are another wireless al- 
s ternative to connect to a LAN. RF signals are also sub- 
ject to intermittent interruption. 

Cellular telephone technology provides yet another 
wireless alternative lor LAN connection. Cellular signals 
are subject to interruption due to switching or interrup- 
ts tion by a physical obstruction such a tunnel or structure. 
These technologies provide mechanisms for estab- 
lishing data communication links to remote clients. 
Those mechanisms arc incorporated into a number mo- 
bile products used by an increasing number of people. 
15 Mobile products such as laptop or palmtop computer 
systems, and personal digital assistants (PDA) often 
use wireless communications data links to connect di- 
rectly from the remote device to a server 

The computer acting as the server to the mobile cli- 
20 ents typically includes a server file management system 
that enables client systems to store and access files on 
the server. The file management system is part of the 
server network operating system (NOS). Such systems 
include the IBM LAN Server Program Product and the 
25 Novell Netware Program Product. In addition, server file 
systems such as the Network File System (NFS) and 
Andrew File System(AFS) are provided on servers 
based on the UNIX* Operating System. (UNIX is a reg- 
istered trademark in the United States and other coun- 
30 tries licensed exclusively through X/Open Company 
Ltd.) 

Existing server file systems compensate tor tempo- 
rary disconnections by assigning a time-out period tor 
each low level file system access request. If the request 
35 has not been satisfied within the time-out period, the 
system signals thai the data communications link has 
become disconnected and f urthei processing ceases. 

Determining the appropriate time-out value for tow 
level file system requests can be difficult. If the timeout 
40 period is set too short, the system will signal disconnec- 
tion when the signal has had only an intermittent inter- 
ruption. Selection of a longer time-out period, however, 
may cause the system to wait for a potentially long pe- 
riod of time before detecting a true data communications 
45 link disconnection. Time-out values have typically been 
set higher than necessary to avoid false disconnection 
indications. Time-out value selection is further compli- 
cated by the fact that most servers must support both 
long and short duration time-outs concurrently because 
so they support mobile devices with different types of data 
communications links. 

The technical problem exists to find a time-out strat- 
egy that minimizes the time needed to detect actual dis- 
connection while properly supporting intermittent dis- 
ss connections due to temporary communications link in- 
terruptions. 
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Summary of the Invention 

The present invention is directed to providing a 
mechanism tor dynamically varying tile system request 
time-out values based on the actual characteristics ot 
the network connection. The present invention is direct- 
ed to a client side apparatus and method for measuring 
Ihe delay found in the dala communications link being 
used and lor dynamically modifying the time-out value 
based on the current delay characteristics. 

The present invention is directed to a computer im- 
plemented process for detecting network failure with 
minimal delay in a network system connecting a sourco 
device to one or more target devices, the network sys- 
tem operates over any one of a plurality of communica- 
tion links each having variable communication band- 
width and being subject to intermittent non-failure dis- 
connection The invention is directed to a process that 
comprises the following steps: initializing a network 
service request time-out period for one of the one or 
more target devices: repeating the following steps for 
each of a plurality of network service requests to the one 
ot the one or more target devices: issuing a network 
service request overthe communications link; signalling 
network failure if the network service request is not sat- 
isfied within the time-out period; measuring network 
service request time il the network service request is 
satisfied; and modifying the time-out period in response 
to the network service request time. 

It is therefore an object of the present invention to 
measure the actual delay inherent in a data communi- 
cation link established by a client workstation and to ad- 
just Tile system request time-out values based on that 
measurement. 

It is another object ot the invention to provide an 
apparatus for differentiating between intermittent and 
full disconnection of a communication link and to mini- 
mize the time required to detect an actual disconnection. 

II is still another object of the invention to provide a 
method for establishing separate time-out values for dif- 
ferent typos of file system requests in recognition of tho 
processing delays inherent in each type of file system 
request. 

It is yet another object of the present invention to 
provide a single file system request time-out strategy for 
multiple types ot connections with differing bandwidths 
and frequencies of disconnection. 

The foregoing and other objects, features and ad- 
vantages of the invention will be apparent from the fol- 
lowing more particular description of a preferred embod- 
iment of the invention, as illustrated in the accompany- 
ing drawing wherein like reference numbers represent 
like parts of the invention. 

Brief Description of the Drawings 

The invention will now be described, by way of ex- 
ample only, with reference to the accompanying draw- 



ings, in which: 

Figure 1 is a block diagram of a system in which the 
preferred embodiment of the invention is practiced: 

Figure 2 is a block diagram of a computer system 
in which the present invention is implemented; 

Figure 3 is a block diagram depicting the relation- 
10 ship between application program, operating sys- 
tem and file system programs: 

Figure 4 is a timing diagram that illustrates the tim- 
ing of a File System request across a network: 

15 

Figure 5 is a flowchart illustrating the steps of the 
present invention. 

Figure 6 is a flowchart illustrating in greater detail 
20 the steps of the present invention in an alternate 
embodiment: 

Figure 7 is a flowchart depicting the steps in the re- 
sponse monitor of the present invention: and 

25 

Figure 8 is a flowchart depicting the steps of the 
connection testing daemon. 

Detailed Description 

30 

The preferred embodiment of the present invention 
is used in a network of computer systems. Figure 1 il- 
lustrates a network configuration of computers 100 in 
which the present invention may be practiced. A local 

35 area network (LAN) or wide area network (WAN) inter- 
connects a server 104 with client workstations 106 106 
and 1 1 0. The clients are each connected through a data 
communications link. Client workstation 100 is connect- 
ed using an infrared link. Client 106 is connected 

40 through a telephone or cellular telephone link. Client 110 
is connected through dedicated not work wiring. Each of 
these clients can expect different network delays and 
frequency of intermittent disconnections. The preferred 
embodiment of the present invention operates with any 

45 of the above mentioned data communication link types 
but is not limited to those. Other forms of radio or optical 
links can be employed. In addition, any form of network 
protocol may be used including token ring and ethernet 
protocols. 

so Each of the client and server workstations has a 
structure similar to that shown in Figure 2. The worksta- 
tion 202 includes processor 204, memory 206, I/O con- 
troller 208, and communications controller 210. I/O 
processor 208 supports a number of devices such as a 

55 graphic display 214, a keyboard 216, and permanent 
and removable storage media 21 8 and 220. The storage 
media can be of any known type including magnetic and 
optical disks or cartridges. Communications controller 
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?10 manages communications over a data link connec- 
tion 212. The present invention can be practiced with 
many different configurations of computer system. The 
preferred embodiment is implemented on an IBM Think- 
Pad Computer System. (IBM and ThinkPad are trade- s 
marks of the IBM Corporation.) 

The present invention allows an application pro- 
giam or system piogram to access data on a server 
through a communications link. Figure 3 illustrates the 
software structure of a system according to the pre- 10 
ferred embodiment of the present invention. An applica- 
tion program 302 requests data for processing by issu- 
ing a data roqucst to the operating system 304. The op- 
erating system is responsible for managing system re- 
sources and satisfying application and system requests is 
for resources The present invention can be practiced 
on operating systems such as the IBM OS/2 WARP Op- 
erating System, the Microsoft Windows NT operating 
system, and the UNIX operating system. Operating sys- 
tem 304 satisfies application or system file request by 20 
accessing data storage 308. {Storage 308 can be any 
of the aforementioned data storage media in either per- 
manently installed or removable configurations.) The 
operating system uses file system access services con- 
tained in the operating system or may use Installable 25 
File Services 310. Installable file services allows the us- 
er of the computer system to install particular file sys- 
tems to support specific requirements of the user. Ex- 
amples of installable file systems are the IBM High Per- 
formance File System (HPFS) and the IBM Mobile File 30 
Synch feature of the IBM Attachpak Program Product. 
LAN client software such as the IBM LAN Requester are 
installable file systems that intercept file system re- 
quests and pass them over the network to a server for 
processing. 35 

An installable file system intercepts operating sys- 
tem file services requests and services the request us- 
ing the particular services of the installable file system. 
The preferred embodiment of the present invention is 
implemented in the Mobile File Sync Installable File Sys- *o 
torn. The Mobile Fife Synch IFS is designed to support 
mobile computing for users who use networks. When 
the user is connected via network link 314 to a LAN/ 
WAN configuration, application file system requests are 
passed by the IFS through the network interface to the 45 
LAN/WAN server for servicing Mobile File Synch in- 
cludes a mechanism for locally caching data in use by 
the client system. If the Mobile File Synch detects data 
link 31 4 disconnection, then it attempts to satisfy file sys- 
tem requests from local cache 312. While the preferred so 
embodiment uses a file system with caching, the inven- 
tion is not limited to such a system and can be used with 
any LAN Client that intercepts operating system file sys- 
tem requests. 

The present invention differs from asynchronous file 55 
transfer systems in that it processes low level file system 
requests. Asynchronous tile transfers typically request 
that a specific file be transferred from a server to the 



client. The file transfer software monitors transmission 
and ensures that ail blocks are sent and received. Some 
file transfer programs allow retransmission of missed 
blocks of data. The present invention services low level 
file system requests such as a request to read one 
record from a data file. These requests are issued by 
the application or system program 302 that has no 
knowledge of whether the data will be found locally or 
remotely. The present invention transparently services 
the request from a remote server. The remote server 
services the request in the same way it would service 
any other local data request. Direct servicing of requests 
avoids the delays inherent in cross network transfor of 
data managed by the network software. 

The present invention supports all types of low level 
file system requests. Figure 4 illustrates the processing 
of a FileRead request from an application program This 
request is issued by the application program to get ad- 
ditional data for processing and may be. for example, a 
request for the next record from a data file. 

The application FileRead request is passed to the 
operating system which issues a file system read (FS- 
Read) to the file system services. The installable file sys- 
tem intercepts this request and issues a FSRead to the 
server across the network. The FSRead according to 
the present invention is issued with a dynamic time-out 
value that is determined in the manner set forth in great- 
er detail below. The FSRead with time-out is transmitted 
over the data communications link to the server for 
processing. The server issues a FSRead to the physical 
device returning the requested data. The data is re- 
turned to the application via the network, installable tile 
system and operating system. 

Time delays are present in the FSRead processing 
as indicated in Figure 4. In particular, the delay between 
the IFS FSRead request being issued to the server and 
receipt of the response is indicated as t r If the time t r 
exceeds the time out value specified by the FSRead with 
time-out then the installable file system signals a dis- 
connection. As long as the time t r is less than the time- 
out value then the IFS takes no action to disconnect 
even though, in fact, a temporary disconnection occurs. 
Figure 4 illustrates the components of t r including t, tj. 
the network transmission delays, and t 2 the delay re- 
quired to service the FSRead request. As each type of 
request (FSRead, FSWrite, etc ) requires a different 
service time, the total delay and hence the time-out val- 
ue preferably varies by type of request. 

The present invention dynamically varies the time- 
out value by measuring the actual time required to serv- 
ice a request. The preferred embodiment sets upper and 
lower bounds on the time-out to provide a minimum level 
of intermittent disconnection protection and a maximum 
wait for actual disconnection. The preferred embodi- 
ment allows these parameters to be set by the system 
user to adapt to particular situations. 

The process of the present invention is shown in 
Figure 5. The process starts 502 and begins by setting 



7 



EP 0 767 558 A1 



8 



the minimum, maximum and current time-out value. The 
preferred embodiment uses a minimum time-out value 
of 15 seconds and a maximum of 60 seconds. Initially, 
the current time-out value is set to the maximum. The 
system next attempts the initial connection to the server 
file system. A connection timer 508 is started when the 
connection request is sent. If a connection is not com- 
pleted before the expiration of the lime-out aaaperiod, 
the system signals failure to connect and the file system 
operates in disconnected mode 514 until a connection 
is established. If the connection is successfully complet- 
ed 510, the length of time required to connect is meas- 
ured from the connoction timer 512. The preferred em- 
bodiment uses readings from the system 31 .25 millisec- 
ond clock to determine elapsed time (see Figure 7.) Oth- 
er connect timers could be used, for example, an asyn- 
chronous DOS timer. 

Next, the connection time is compared 518 to the 
minimum time-out value. If it is less than or equal to the 
minimum time-out value the current time-out value is set 
to the minimum time-out value 520. Otherwise, the cur- 
rent time-out value is set to be the connection time plus 
a specified buffer time 522. In the preferred embodi- 
ment, the buffer time differs for each different type of file 
system call. 

The current time-out value set at the time of con- 
nection is used for the next file system request 524 and 
then adjusted based on the response time for that re- 
quest. Prior to sending the file system request to the 
server, the file system of the present invention tests 
whether a connection exists 526. If no connection exists, 
disconnection is signalled and the tile system enters dis- 
connected mode 514. If a connection exists, the file sys- 
tem request with time-out value is sent 527 to the server. 
The file system request time is started 530 and then 
measured upon successful completion 532. The system 
tests whether the file system request is satisfied within 
the time-out period 528. If not satisfied, the system en- 
ters disconnected mode 514. Otherwise : the actual re- 
quest service time is calculated. The steps of dynami- 
cally adjusting the time-out value 518-522 are repeated 
for each file system request. 

In the preferred embodiment, a buffer value is es- 
tablished for each File System request type. Each File 
System Request type is given an individual time-out val- 
ue based on actual request servicing time. The buffer 
value and time-out value for each File System Request 
type is stored in a table that is accessed whenever a 
request ot that type is issued. Use of the table of buffer 
and time-out values for file system requests is illustrated 
in the diagram of Figure 6. Alternate embodiments are 
based on a single buffer value and single time-out value. 
The time-out value of these alternate embodiments 
must allow for greater variation due to the many service 
types. The buffer value must be large enough to enable 
processing of the longest file service request. This re- 
sults in less than optimal disconnection recognition tor 
shorter period lile system requests. 



The file system remains in disconnected mode until 
it receives an indication 51 6 that the network connection 
has been restored. The indication can be generated in 
several ways. In the preferred embodiment of the inven- 

£ tion, the file system periodically polls the server to de- 
termine if the file system is connected to the server (Fig- 
ure 8.) The file system of the preferred embodiment is- 
sues a QueryPath request for the directory to which it is 
intended to be connected. The process blocks until a 

10 response is received. The task sleeps for five seconds 
and then tests for success. If not successful, disconnect- 
ed mode is signalled. If successful, connected mode is 
signalled. 

Alternatively, the server can send a signal whenever 
1$ a connection to the client is reestablished. 

As indicated above, aspects of this invention pertain 
to specific "method functions" implementable on com- 
puter systems. In an alternate embodiment, the inven- 
tion may be implemented as a computer program prod- 
20 uct for use with a computer system. Those skilled in the 
art should readily appreciate that programs defining the 
functions of the present invention can be delivered to a 
computer in many forms: including, but not limited to: 

2S (a) information permanently stored on non-writea- 
ble storage media (e.g. read only memory devices 
within a computer such as a semiconductor ROM 
or CD-ROM disks readable by a computer I/O 
attachment) : 

30 

(b) information alterably stored on writable storage 
media (e.g floppy disks and hard drives) ; or 

(c) information conveyed to a computer through 
35 communication media such as a network and tele- 
phone networks via a modem. II should be under- 
stood, therefore, that such media, when carrying 
computer readable instructions that direct the meth- 
od functions of the present invention represent al- 

40 ternate embodiments of the present invention. 

It will be understood from the foregoing description 
that various modifications and changes may be made in 
the preferred embodiment of the present invention with- 
es oul departing from its true spirit. In particular, while file 
system requests have been used in the description, re- 
quests for other shared resources such as serial devic- 
es, printers and processor time could be similarly han- 
dled. It is intended that this description is for purposes 
so of illustration only and should not be construed in a lim- 
iting sense. The scope of this invention should be limited 
only by the language of the following claims. 

55 Claims 

1 . A computer implemented process for detecting net- 
work failure with minimal delay in a network system 
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connecting a source device to one or more target 
devices, said network system operable over any 
one of a plurality of communication links each hav- 
ing variable communication bandwidth and being 
subject to intermittent non-failure disconnection, 5 
the process comprising the steps of: 

initializing a network service request time-out 
period for one of said one or more target devic- 
es; w 

repeating the following steps for each of a plu- 
rality of network service requests to said one of 
said one or more target devices: 

15 

issuing a network service request over said 
communications link; 

signalling network failure if said network 
service request is not satisfied within said 20 
time-out period: 

measuring network service request time if 
said network service request is satisfied; 
and 25 

modifying said time-out period in response 
to said network service request time 

The process of claim 1 wherein the step of initiatiz- 30 
ing a network service request timeout period com- 
prises the steps of: 

receiving a minimum and a maximum time-out 
value for each of said target devices; 35 

setting said network service request time-out 
period equal to said maximum time-out value 
for said one of said one or more target devices. 

40 

The process of claim 1 wherein the source device 
contains a system clock, and wherein the step of 
measuring network service request time comprises 
the steps of 

45 

reading said system clock an storing a first sys- 
tem clock value in a storage area: 

reading said system clock to determine a sec- 
ond system clock value upon successful com- so 
pletion of said network service request before 
the end of the time-out period; and 

determining network service request time as 
the difference between said second system 55 
clock value and said first system clock value. 

The process of claim 2 wherein the step of modify- 



ing said time-out period in response to said network 
service request time comprises the steps of: 

setting said time-out period to the minimum 
time-out value if said network service request 
time is less than or equal to said minimum time- 
out value; 

setting said time-out period to the lesser of said 
network service request time plus a service re- 
quest buffer interval or said maximum time-out 
value, if said network service request time is 
greater than said minimum time-out value. 

5. The process of claim 1 , wherein the step of signal- 
ling network failure comprises the steps of: 

initializing an independent timer with said time- 
out period: 

starting said independent timer when said net- 
work service request is issued: 

cancelling said independent timer if said net- 
work service request is satisfied before said in- 
dependent timer completes the time-out period; 
and 

cancelling the network service request, cancel- 
ling said independent timer, and signalling net- 
work failure if said independent timer com- 
pletes the time-out period before the network 
service request is satisfied. 

6. A computer program product for use with distributed 
computer system connected to a network system, 
said computer program product comprising: 

a computer usable medium having computer 
readable program code means embodied in said 
medium for causing detection of network failure with 
minimal delay in a network systom connecting a 
source device to one or more target devices, said 
network system operable over any one of a plurality 
of communication links each having variable com- 
munication bandwidth and being subject to intermit- 
tent non-failure disconnection, said computer pro- 
gram product having: 

computer readable program code means for 
causing a computer to initialize a network serv- 
ice request time-out period for one of said one 
or more target devices: 

computer program product means for causing 
a computer system to repeat the following steps 
for each of a plurality of network service re- 
quests to said one of said one or more target 
devices: 
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computer program product means for causing 
a computer system to issue a network service 
tequest over said communications link; 

computer program product means for causing £ 
a computer system to signal network failure if 
said network service request is not satisfied 
within said lime-out period; 

computer program product means for causing io 
a computer system to measure network service 
request time if said network service request is 

satisfied: and 



9. The computer program product of claim 6 wherein 
said network service requests are low-level file sys- 
tem requests. 

10. The computer program product of claim 6, further 
comprising: 

computer program product means for causing 
a computer system to set the source device to a dis- 
connected state in response to the signalling of net- 
work failure. 



computer program product means for causing is 
3 computer system to modify said time-out pe- 
riod in response to said network service request 

lime 



7. The computer program product of claim 6 wherein 20 
the computer program product means for causing 

a computer system to initialize a network service re- 
quest time-out period comprises: 

computer program product means for causing 25 
a computer system to receive a minimum and 
a maximum time-out value for each of said one 
or more target devices; 

computer program product means for causing 30 
a computer system to set said network service 
request time-out period equal to said maximum 
time-out value ol said one of said one or more 

target devices. 

35 

8. The computet program product of claim 6 wherein 
the source device contains a system clock, and 
wherein the computer program product means for 
causing a computer system to measure network 
service request time comprises: 40 



computer program product means for causing 
a computer system to read said system clock 
an storing a first system clock value in a storage 

area 1 45 



computer program product means for causing 
a computer system to read said system clock 
to determine a second system clock value upon 
successful completion of said network service so 
request before the end of the time-out period; 
and 



computer program product means for causing 
a computer system to determine network serv- 55 
ice request time as the difference between said 
second system clock value and said first sys- 
tem clock value. 
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